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Abstract — The Invertible Bloom Lookup Tables (IBLT) is a 
data structure which supports insertion, deletion, retrieval and 
listing operations of the key-value pair. The IBLT can be used to 
realize efficient set reconciliation for database synchronization. 
The most notable feature of the IBLT is the complete listing 
operation of the key-value pairs based on the algorithm similar to 
the peeling algorithm for low-density generator-matrix (LDGM) 
codes. In this paper, we will present a stopping set (SS) analysis 
for the IBLT which reveals finite length behaviors of the listing 
failure probability. The key of the analysis is enumeration of 
the number of stopping matrices of given size. We derived 
a novel recursive formula useful for computationally efficient 
enumeration. An upper bound on the listing failure probability 
based on the union bound accurately captures the error floor 
behaviors. It will be shown that, in the error floor region, the 
dominant SS have size 2. We propose a simple modification on 
hash functions, which are called SS avoiding hash functions, for 
preventing occurrences of the SS of size 2. 

I. Introduction 

The Invertible Bloom Lookup Tables (IBLT) is a recently 
developed data structure which supports insertion, deletion, 
retrieval and listing operations of the key-value pairs ftf]- 
[ 1 1 . The IBLT can be seen as a natural extension of the 
Bloom filter 0~|-||6) which can handle set membership queries. 
The most notable feature of the IBLT is the complete listing 
operation of the key-value pairs based on the algorithm similar 
to the peeling algorithm lfl2l for low-density generator-matrix 
(LDGM) codes. 

The listing operation enable us to use the IBLT for a 
basis of an efficient set reconciliation algorithm with small 
amount of communications. Set reconciliation is a process to 
synchronize contents of two sets at two distinct locations and 
it can be used for realizing database synchronization, memory 
synchronization, and an implementation of the Biff codes 111 OH . 
The implementation of the IBLT is fairly simple and it is 
naturally scalable to multiple servers, which is a desirable 
feature for data sets of extremely large size. 

The paper by Goodrich and Mitzenmacher Q provides 
the detailed analysis on the IBLT such as the optimization 
of the number of hash functions to minimize the retrieval 
failure probability. They also presented asymptotic thresholds 
for accurate recovery by using the known results on 2-cores 
of random hypergraphs. Furthermore, some fault tolerance 
features of the IBLT are extensively studied. 

For designing practical applications, it is beneficial to know 
not only the asymptotic behavior of listing processes but also 



finite length performances. Especially, predicting the error 
floor of the listing failure probability is required to guarantee 
the accuracy of a listing process. It is known stopping sets 
ifTTI dominate the finite length performance of LDGM codes 
for erasure channels. In the case of the IBLT, the stopping sets 
have crucial importance as well as the case of LDGM codes. 
In this paper, we will present a stopping set analysis for the 
IBLT which unveils the finite length behaviors of the listing 
failure probability. 

The outline of this manuscript is organized as follows. 
Section [TT] introduces notation and definitions required for this 
paper. A brief review of the IBLT is also given. Section [HI] 
provides an upper bound on the listing failure probability. 
An enumeration method for the number of stopping matrices 
based on a recursive formula is the heart of the efficient 
evaluation of the upper bound. Section [TV] presents some 
results of computer experiments. It will be shown that, in 
the error floor region, the stopping sets with size 2 become 
dominant. In Section|Vl a class of hash functions, SS avoiding 
hash functions, is proposed to resolve the stopping sets with 
size 2 for lowering the error floor. 



II. Preliminaries 



A. Bloom Filter 



Before going into details of the IBLT, we here explain the 
structure of the original Bloom filter (BF) which is the basis 
of the IBLT. Assume that we have a binary array T and 
/c-hash functions hi,...,hk- The binary array T is initially 
set to all zero. When an item x comes to insert, we set 
T[hi(x)] — 1 for i 6 [1, k]. The notation [a,/3] means 
the set of consecutive integers from a to f3. The process is 
called the lnsert(a;) operation. The set membership query on 
y is the query for checking whether y is in the BF or not. 
The LookUp(y) operation returns YES if T[hi(y)] = 1 for 
i G [l,fc]; otherwise it returns NO. The operations Insert(x) 
and LookUp(y) can be carried out in 0(fc)-time. Note that the 
LookUp(y) operation may yield false positive; i.e., it returns 
YES when y is not in the BF. The minimization of this false 
positive probability in terms of the number of hash functions is 
an important topic of studies of the BF 1 1 1 [ 6 1 . An appropriately 
designed BF provides a highly space efficient set membership 
query system with reasonably small false positive probability. 



B. IBLT and its Operations 

As in the case of the BF, fc-hash functions h\, . . . , hk are 
used in the IBLT. Instead of binary array, the IBLT utilizes 
an array of cells T[l], . . . , T[m). A cell T[i] consists of three 
fields which are called Count, KeySum, and ValueSum, which 
are denoted by T[i]. Count, T[i]. KeySum, T[i].ValueSum. 
An input to the IBLT is a key-value pair (Key, Value). The 
count field represents the number of inserted entries. The 
KeySum (resp. ValueSum) field stores exclusive OR of key 
(resp. value) of inserted entries. The contents of all the cells 
are initialized to zero at the beginning. 

The IBLT allows 4-operations: lnsert(ai, y), De\ete(x,y), 
Get(ir) and ListEntries(). The operation lnsert(x,j/) stores 
a key-value pair (x, y) into the IBLT. In an insertion pro- 
cess, the key x (resp. value y) is added (over F2) to the 
KeySum (resp. ValueSum) filed of T[hi(x)] for i e [1,% 
namely, T[hi(x)].KeySum = T[hi(x)]. KeySum © x and 
T[hi(x)}. Value Sum = T[hi(x)].V alueSum © y. The count 
field of T[hi(x)] is also incremented as T[hi(x)]. count = 
T[hi(x)}. count + 1 at the same time. The operation 
Delete(ir, y) removes the key-value pair (x, y) from the IBLT. 
The process is the same as that of \r\sex\(x,y) except for 
decrementing the counter. The operation Get(ir) retrieves the 
value corresponding to the key x. This operation is realized as 
follows. If there exists i e [1, k] satisfying T[hi(x)].C ount = 
1, then Get(x) returns T[hi(x)].V alueSum. Otherwise, 
Get(x) declares the failure of the operation. 

The last operation ListEntriesQ outputs all the key-value 
pairs in the IBLT by sequentially removing the entries with 
the counter value equal to one from the table. The de- 
tails of the process is as follows. We first look for i 6 
[l,m] satisfying T[i].Count = 1. If there exists i* satis- 
fying the condition T[i*]. Count = 1, the key- value pair 
(T[i*].K eySum, T[i*]. Value Sum) is registered into the out- 
put list and then Delete(T[i*]. KeySum, T[i*]. ValueSum) is 
executed. This process is iterated until no cell with the counter 
value equal to one can be found. It should be remarked that, 
in some cases, ListEntriesQ fails to list all the entry in the 
IBLT. This is because a non-empty IBLT can have counter 
values larger than one for i £ [l,m]. This failure event is 
called a listing failure. It is desirable that an IBLT is designed 
to decrease the frequency of the listing failure events as small 
as possible. 

C. Probabilistic Model 

It is clear that the probability of the listing failure event, 
which is called the listing failure probability, depends on 
the definition of the probabilistic model for keys and hash 
functions. In this paper (except for Section |V}, we assume 
the following model for keys and hash functions. The hash 
functions h\,...,hk have domain {0,l} b and the key of 
the entries to be stored are independent random variables 
uniformly distributed over {0,l} b . The number of entries 
are assumed to be n. The hash functions are assumed to be 
uniform such that hi(x) distributes uniformly in the range of 



hi when x 6 {0, l} b obeys the uniform distribution. The m- 
cells are split into fc-subtables each of size m/k and each hash 
function uniformly selects a cell in a subtable. In other words, 
the range of hi is [(i — 1) * (m/k) + l,i * (m/k)}. 

III. Upper Bound on Listing Failure Probability 

In this section, we will derive an upper bound on the 
listing failure probability. The listing failure event occurs when 
a stopping set ifTTI . which is a combinatorial substructure 
of a matrix, appears. In order to evaluate the listing failure 
probability, we need to enumerate the number of stopping 
matrices of given size. A stopping matrix is a matrix with 
no row of weight one corresponding to the case where no 
cells with counter value equal to one exists. 

A. Enumeration of Stopping Matrix 

The state matrix B of an IBLT can be represented by an 
m x n binary matrix where m — Ik. A row of the matrix B 
corresponds to a cell and a column corresponds to an entry. 
The matrix B can be divided into disjoint fc -blocks with size 
£ x n. If the (s, i)-element of the u-th block of B is one, this 
means that the t-th entry is hashed to the s-th cell by using the 
w-th hash function. Suppose that a sub-matrix M' consisting 
of several columns of M have no rows of weight one. In such 
a case, ListEntriesQ fails to list all the entry in this table 
because M' cannot be resolved in the peeling process. If a 
binary matrix M' does not have a row with weight one, M' 
is said to be a stopping matrix. The existence of a stopping 
matrix in B is the necessary and sufficient condition for the 
failure of a peeling process l!TTIlfT2l . 

In our case, the state matrix B is divided into fc-subblocks 
corresponding to subtables. It might be reasonable to consider 
a stopping matrix in a subblock before discussing the proba- 
bility of the event that B includes a stopping matrix. 

Let S (e ^ be the set of £ x n binary matrices with column 
weight one; i.e., 

S^ = {{m x , m„) G {0, \} t/n I wt( mi ) = 1, i e [1, n}}, 

where wt(-) represents the Hamming weight function. From 
this definition, it is evident that the cardinality of S™> n > is £ n . 
The number of the stopping matrices in S^*™) is denoted by 
z(£,n), which can be written as 

z(£, n) = #{M e S {e ' n) M is a stopping matrix}. (1) 

For convention, z(0, 0) is defined to be 1. 

The next recursive formula plays a key role to enumerate 
z(£, n) which is required for evaluating an upper bound for 
the listing failure probability. 

Theorem 1 (Recursive formula on z(£,n)): The following 
recursive relation 

min(e,n) . . . . 

z(£,n) = £ n - ^ d [lj [I J < l ~ c ' n ~ c ) 
holds for £ > 1 and n > 1. 



(Proof) Let a(£, n) be the cardinality of non-stopping matrices 
a(£, n) = £ n — z(£, n). In the following, we enumerate a(£, n) 
by using a recursive relation. For given M G S^ t,n \ a pair 
G [1,1] x [l,n] is said to be a pivot of M if My = 1 
and the Hamming weight of the i-th row of M is 1. The set 
of pivots of M is denoted by 

piv(M) = G [!,£} x [l,n] | is a pivot of M}. 

Note that M is a stopping matrix if and only if piv(M) is 



empty. The cardinality of non-stopping matrices a( 
be represented by 



mm t,n 



a(£,n)= # T < 



(l,n) 



, n) can 



(3) 



where 



T (i,n) A {M g 5 (^ n) | # piv ( M } =i ^ ie [0,min(^n)]. 

(4) 

This is because the set of non-stopping matrices can be 
partitioned into disjoint sets t/ ' n ' for i £ [l,min(£, n)]. In 
the following, we will try to prove the equality 



(5) 



for i G [1, min(l, n)]. Assume that M G T c is given 
(c G [1, min(^, n)]). By getting rid of all the column and rows 
corresponding to piv(M) from M, we obtain an {£ — c) x 
(rt — c) matrix M'. Namely we delete the i-th row and the 
j-th column from M if G piv(M). From the assumption 
M G Tc^' n \ the resulting matrix M' must be a stopping matrix 



-t(£— c,ro— c) 



in Tq C w . Note that the size of Tq c ' n c ^ is given by 
c, n—c). Therefore, the size of T^' n ' is the product of the 
number of possible ways to choose piv(M) and — c, n — c). 
Based on a simple combinatorial argument, we can see that the 
number of possible ways to choose piv(M) can be enumerated 
as c!( ) ("). As a result, we have the equality (0. Combining 
(01 and (0, the claim of the theorem is obtained. rj 
For some special combinations of £ and n, z(£,n) has a 
simple expression as follows. 

(6) 
(7) 
(8) 
(9) 

These expressions can be easily proved based on the definition 
of the stopping matrix and of S"' n K The recursive formula 
(0 enable us to evaluate the value of z(£, n) efficiently. These 
simple expressions can be used as boundary conditions for a 
recursive evaluation process. 

Table U presents the values of z(£,n) for (£, n) G [1,10] 2 . 
These values are computed based on the recursive formula (|2). 
Note that S^'™) contains 10 10 -matrices when I = n = 10. 
A naive enumeration scheme generating all the matrices in 
S^, n ) ma y have computational difficulty even for such small 
parameters. 
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TABLE I 

Values of z(£, n): Number of Stopping Matrices in S(^ n ) 
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B. Listing Failure Probability and its Bound 
The set of all the state matrix is defined as 

B (i.n,k) A {(Mi) ^ Mk)T | ^ g . g ^ fc]} _ (1Q) 

The cardinality of B^' n ^ is £ nk . According to the scenario 
we have discussed in the previous section, we here define a 
probability space by assigning the equal probability l/£ nk to 
each element in B^' n - k \ 

Suppose that Pf(£,n, k) represents the listing failure prob- 
ability, which is the probability that ListEntries() operation 
fails to list all the entries in the IBLT The next theorem 
provides an upper bound on Pf(£, n, k). 

Theorem 2 (Upper bound on listing failure probability): For 
given I > l,n > \,k > 1, the listing failure probability 
Pf(£, n, k) can be upper bounded by 



i=2 



(Proof) The peeling process of the ListEntriesQ fails to 
recover all the entries in the IBLT if and only if B G 
contains a stopping matrix as its sub-matrix. Thus, P/ (£, n, k) 
can be characterized as 

Pf(£,n,k) = Pr[B includes a stopping matrix]. (12) 

For an index set I G 2^' n \ let Bx be the sub-matrix 
of B consisting of columns of B with indices in I. If 
Bx is a stopping matrix, then the index set I is said to be a 
stopping set. The probability Pf(£, n, k) can be upper bounded 
as follows: 



P f (£,n,k) 



Pr[B includes a stopping matrix] 



Pr 



Bx is a stopping matrix 

IG2[!."]\0 

< Pr [I is a stopping set] . (13) 

The last inequality is due to the union bound. From the 
definition of the probability space defined on B™> n ' k \ the 
probability that I is a stopping set is given by 

Pr [1 is a stopping set] = f ^'jf^ l • (14) 



£# x 



By using this equality, we have the following upper bound: 

Pf(£, n, k) < Pr [I is a stopping set] 

IS2[1."1\0 
n 

= Pr [1 is a stopping set | j^I = i] 

i=l I 6 2l 1 ."l\0 



i=2 



n\ ( z(£, i) 



(15) 
□ 



In the last equality, we used the fact z(£, 1) = 0. 

IV. Computer Experiments 

In this section, we will present several results on computer 
experiments and on numerical evaluation of the upper bound 
presented in the previous section. 

In order to examine the tightness of the bound, Figure [T] 
presents curves of the listing failure probability obtained by 
computer experiments (dashed line) and of the upper bound 
(solid line). These curves are plotted as functions of the 
number of cells to. The number of entries is n = 210 and the 
symbol size of the key is b = 32. In computer experiments, the 
number of trials is 10 6 . As a hash function, SHA-l lfT3ll was 
used. The number of the hash functions assumed to be k = 3. 
We used pseudorandom 32-bit numbers for pseudorandom 
key-value pairs. It can be observed that the upper bound gives 
fairly tight estimation, as the number of cells to increases. 
As in the case of LDPC codes, the error curve in Figure [1] 
exhibit both water fall and error floor phenomenon. This result 
indicates that the upper bound precisely captures the error floor 
behavior of the listing failure probability. 

From the upper bound, it is possible to see a tradeoff 
between the water fall and error floor. Figure presents the 
upper bounds for k € [3, 6]. The number of entries is n = 100. 
A curve of the upper bound is plotted as a function of the 
number of cells to. We can observe that the listing failure 
probabilities in the error floor region can be decreased as the 
number of hash functions k increases. On the other hand, 
increments of k pushes the water falls to the right. 

From the upper bound and some experimental results, we 
see that stopping sets of size 2 dominates the error floor 
behavior. Figure [3] presents the upper bound, the asymptote 
Pi{l, n, k) defined by 

P 2 (e,n,k) = 



n\ 1 
2j¥ 



(16) 



and the experimental value of the list error probability. The 
result suggest that the probability of occurrence of stopping 
set of size 2 determines the depth of an error floor. 

V. SS Avoiding Hash Function 

We have seen that stopping sets of size 2 dominate the 
behavior of the list failure probability in the error floor region. 
The stopping sets of size 2 occur when fc-hash values for 2- 
distinct keys collide; i.e., 

(fti(o), h 2 (a), . . . , h k (a)) = (hi(b), h 2 (b), . . . , h k (b)) (17) 
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Fig. 1. Comparison of the listing failure probability: experimental values and 
upper bound (n = 210, k = 3, b = 32). 
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Fig. 2. Comparison of the upper bound on listing failure probability: 3 hashes, 
4 hashes, 5 hashes and 6 hashes (n = 100). 
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Fig. 3. Comparison of the listing failure probability: experimental values, 
upper bound and asymptote P2(£, n, k) (n = 210, k = 3, b = 32). 



for a 7^ b. If this type of collision can be prevented, it is 
expected that the error floor performance can be improved. 

The SS avoiding hash function defined here are designed 
so that the collisions ( TTTb are avoided. In the following 
discussion, we will further assume the uniqueness of keys 
registered in the IBLT. Namely, an insertion of district entries 
with the same keys and a multiple insertion of the same key- 
value pairs are not allowed. This assumption may be natural 
for most of applications such as set reconciliation. 

Let a hash function h be an bijective map from {0, l} h 
to {0, l} sk where b — sk. The SS avoiding hash functions 
(hi, . . . , hk) are simply defined by partitioning the output sk- 
tuple from h into A; binary s-tuples; i.e., hi(x) is given by 

hi(x) = % + (i-l)2 s + l, i€[l,k], (18) 

where (q 1 , ...,qk) = h(x)(qi E {0, 1} S ). Note that m/k = 2 s 
holds; i.e., each subtable contains 2 s -cells. Due to the as- 
sumption on the uniqueness of the keys in the IBLT, it is 
evident that a collision (fTTT i does not occur. This means that 
occurrences of the stopping sets of size 2 can be completely 
prevented. Note that the use of the SS avoiding hash function 
introduces a restriction on several system parameters; i.e., 
b = sk. This inflexibility can be considered as a price to be 
paid for lowering the error floor. 

It should be remarked that the probabilistic model assumed 
in Section [TT] cannot be directly applied to the system pre- 
sented in this section This is because the assumption on the 
uniqueness of the keys introduces weak correlations between 
the stored entries. Although we have to take care of these dis- 
tinctions, the analysis presented in the previous sections may 
be still useful for predicting the performance of ListEntriesQ 
with the SS avoiding hash functions if b is large enough. 

Figure |4] presents the results of a computer experiment on 
the SS avoiding hash functions. As an bijective map, the 
identity map was exploited. The two curves of listing failure 
probabilities are plotted; the first one corresponds to the case of 
a conventional hash function and the second one corresponds 
to the case of the SS avoiding hash function where the symbol 
size of the key is b = 3s. In both cases, the number of entries 
is n = 210 and the number of hash functions is assumed to 
be k = 3. We can observe that the SS avoiding hash function 
reduces the listing failure probabilities in the error floor region. 
Furthermore, the upper bound almost captures the error floor 
behavior of the listing failure probability in this settings. 

VI. Conclusion 

In this paper, we presented a finite length performance 
analysis on the listing failure probability which may be useful 
for designing a system or an algorithm including the IBLT 
as a building component. The recursive formula presented 
in Section [HI] will become an useful tool for finite length 
analysis. In Section IIVI we have seen that the error floor 
performance can be improved by increasing the number of 
the hash functions but it degrades the waterfall performance. 
From the results shown in Section [V] we can expect that ap- 
propriately designed SS avoiding hash functions can improve 



the error floor performance without sacrificing the waterfall 
performance. 
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Fig. 4. Comparison of the listing failure probability: conventional hash 
function and SS avoiding hash function (n = 210, k = 3). 



